CD44-SNA1 integrated cytopathology for delineation of high grade dysplastic and neoplastic oral lesions

The high prevalence of oral potentially-malignant disorders exhibits diverse severity and risk of malignant transformation, which mandates a Point-of-Care diagnostic tool. Low patient compliance for biopsies underscores the need for minimally-invasive diagnosis. Oral cytology, an apt method, is not clinically applicable due to a lack of definitive diagnostic criteria and subjective interpretation. The primary objective of this study was to identify and evaluate the efficacy of biomarkers for cytology-based delineation of high-risk oral lesions. A comprehensive systematic review and meta-analysis of biomarkers recognized a panel of markers (n: 10) delineating dysplastic oral lesions. In this observational cross sectional study, immunohistochemical validation (n: 131) identified a four-marker panel, CD44, Cyclin D1, SNA-1, and MAA, with the best sensitivity (>75%; AUC>0.75) in delineating benign, hyperplasia, and mild-dysplasia (Low Risk Lesions; LRL) from moderate-severe dysplasia (High Grade Dysplasia: HGD) along with cancer. Independent validation by cytology (n: 133) showed that expression of SNA-1 and CD44 significantly delineate HGD and cancer with high sensitivity (>83%). Multiplex validation in another cohort (n: 138), integrated with a machine learning model incorporating clinical parameters, further improved the sensitivity and specificity (>88%). Additionally, image automation with SNA-1 profiled data set also provided a high sensitivity (sensitivity: 86%). In the present study, cytology with a two-marker panel, detecting aberrant glycosylation and a glycoprotein, provided efficient risk stratification of oral lesions. Our study indicated that use of a two-biomarker panel (CD44/SNA-1) integrated with clinical parameters or SNA-1 with automated image analysis (Sensitivity >85%) or multiplexed two-marker panel analysis (Sensitivity: >90%) provided efficient risk stratification of oral lesions, indicating the significance of biomarker-integrated cytopathology in the development of a Point-of-care assay.


Introduction
Large scale screening and surveillance can reduce the burden of oral cancer, a major public health concern with approximately 400,000 new cases every year [1] and an incidence of 20 per 100,000 in the Indian subcontinent [2].However, despite easy accessibility, around 60-80% of oral cancer patients are diagnosed at an advanced stage [2].Screening programs that aim at detecting Oral Potentially Malignant Disorders (OPMDs) or early-stage cancers have demonstrated a definite reduction in mortality.Oral potentially malignant disorders (OPMDs) are defined as "any oral mucosal abnormality that is associated with a statistically increased risk of developing oral cancer" [3].Early-stage cancer encompasses cancer that is classified within stage I and stage II.Biopsy, the current standard of diagnosis, is invasive with poor compliance and is unsuitable as a screening tool in large populations.Multiple non-invasive diagnostic adjuncts are currently available for early detection, however, pathology-based diagnosis is mandatory to delineate high-risk patients [4][5][6][7].There is, hence, an urgent and unmet need for a minimally invasive, pathology-equivalent, point-of-care (PoC) oral cancer screening tool.
Cytology, a minimally-invasive method is proven to significantly reduce the incidence and down-stage cervical cancers [8].In contrast, oral cytology lacks definitive diagnostic criteria and multiple studies including ours indicated a low sensitivity in the diagnosis of oral dysplastic lesions [9][10][11].Biomarkers representing the carcinogenic processes of cell cycle regulation, signalling, and aberrant glycosylation can be invaluable adjuncts to improve the diagnostic accuracy of oral cytology.Mini-chromosome maintenance proteins (McM2), Laminin γ2, EGFR, CD17 and Ki-67 have been explored as markers in oral cytology [12,13].Lectins, known for their carbohydrate-binding specificities [14,15] have been reported to distinguish cancers of the oral cavity, breast, cervix, and Barrett's oesophagus [16].Initial studies identified Wheat Germ Agglutinin (WGA) as capable of distinguishing oral cancer and dysplasia in-vivo and ex-vivo (sensitivity/specificity>80%) [17][18][19].The evidence points out to the significance of biomarkers in improving diagnostic accuracy, however, further studies are essential to assess and validate their clinical applicability in oral cytology.
The spectrum of oral mucosal lesions includes benign, OPMD to invasive cancer with their management varying from conservative to surgical excision.The treatment decision depends on accurate risk stratification of the OPMDs; identification of the markers that achieve this objective will establish their clinical significance.The central hypothesis of this study was that marker-based cytology will delineate oral High-Grade Dysplastic (HGD: Moderate-Severe dysplasia) lesions and cancer and can be automated.The primary objective was to identify and validate markers in cytology towards developing a minimally invasive, PoC cytology method for delineation of Low-Risk lesions (LRL: Benign/Hyperplasia/mild dysplasia) and HGD lesions along with oral cancer.

Study design
The study was implemented to identify and validate the best biomarker panel towards improving the diagnostic efficiency of oral cytology in delineating oral cancer/HGD from LRL.An evidence-based article search was conducted to find the markers used in delineating oral dysplastic from non-dysplastic lesion (Fig 1).The selected panel of markers were validated by IHC in histologically annotated tissues.The best markers selected were validated in cytology and automated the cytology image analysis using artificial intelligence.

Systematic review and marker selection
A literature review of markers wherein immunohistochemistry (IHC) profiling delineates Dysplastic-OPMD (D-OPMD) from Non-Dysplastic-Oral lesions (ND-OL) was carried out.A search strategy combining terms ("oral premalignant lesions" OR "oral potentially-malignant disorders" OR "oral precancerous" OR "oral dysplasia" OR "oral cancer") and "immunohistochemistry" was used to identify relevant articles (1990 to 2017) in Pubmed (https://pubmed.ncbi.nlm.nih.gov/).The Preferred Reporting Items for Systematic Review and Meta-Analysis  ).The quality was assessed by Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [21] and publication bias was established by funnel plots.The marker selection was based on three benchmarks i) minimum of three studies ii) identified by high-throughput analysis and iii) identify aberrant glycosylation patterns in solid tumors.

Patient cohorts
The study was approved by Narayana Hrudayalaya Medical Ethics Committee (NHH/ MEC-CL-2016-393), Narayana Hrudayalaya, Bangalore, India and, KLE Society's Institute of Dental Science, Bangalore, India.This cross-sectional prospective investigation was carried out as per the guidelines of the Declaration of Helsinki, and its results were reported in accordance with the Strengthening the Reporting of Observational studies in Epidemiology guidelines [22].The oral epithelial cells (brush biopsy) and tissues were collected from Head and Neck Oncology clinic, Mazumdar Shaw Medical Center, Narayana Hrudayalaya, Bangalore, India and Oral Medicine clinic, KLE, Bangalore, India, after written informed consent (December-2017 to March-2021).The subjects with oral lesion/s clinically suspected to be benign, OPMD, or oral cancer, greater than 18 years of age were included in the study.Subjects suffering from any acute illness or debilitating systemic diseases that preclude biopsy were excluded.

Immunohistochemistry validation
Patients were categorized as LRL (benign, hyperplasia, mild-dysplasia), HGD (moderatesevere dysplasia) and carcinoma.The selected markers (S2 Table and S1 Appendix) were validated by IHC in formalin-fixed paraffin-embedded tissue sections (FFPE) in two phases using standard protocols [23].Briefly, the sections were deparaffinised, incubated with the primary antibody (S2 Table ) and detected using the secondary detection system (Dako-Real Envision, K5007).The necessary controls were included (positive/negative).The sections were assessed (400x; 5-10 images/slide; Nikon Eclipse E200) for their intensity (2, 4 or 6), percentage positivity and the final score calculated (percentage positivity x intensity).Staining in the nucleus, cytoplasm, and/or cell membranes indicated positive expression and scoring was blinded to the histology diagnosis.

Immunocytology validation
Cell culture and maintenance.Cal-27 (originated from moderately differentiated squamous cell carcinoma of tongue; a gift from Institute of Bioinformatics, Bangalore) was cultured in DMEM (FBS 10% + antibiotics 1x-Penicillin-Streptomycin), while DOK (dysplastic cell line, obtained from RPCI, NY, USA) was cultured in DMEM (FBS 10%+5ug/ml hydrocortisone).The cell lines were expanded using standard protocols [23] and used for cytology experiments.
Liquid based cytology and immunocytology.The cells were collected from oral lesions and contralateral normal sites prior to biopsy (S2 Fig and S1 Appendix).In lesions indicated for biopsy (OPMD, cancer), histology diagnosis was considered as reference standard, while clinical diagnosis was considered in other subjects (benign lesions, normal mucosa).In order to establish the nomogram, cells were collected from buccal mucosa, tongue and gingiva of healthy subjects without any habits or oral lesions.A cervical cytology brush or Rover Orocellex brush (Rovers Medical Devices B.V, Netherlands) was used to harvest the cells (BD SurePath, BD Biosciences, USA).The cells were centrifuged using the Cytospin TM 4 (Thermo-Scientific; Cat: A78300003, USA).The slides were incubated with the primary antibody (S2 Fig) and staining in the nucleus, cytoplasm, and/or cell membranes was considered positive expression.The slides were visualized (200x; Nikon Eclipse E200, Nikon, USA) and the intensity, pattern of staining and percentage positivity were assessed (10-15 fields/slide).For the fluorescent conjugated-markers, the cells were washed with phosphate buffered saline (PBS) and incubated for 15 minutes and counterstained with the nuclear stain DAPI.In multiplex cytology the cells were incubated with markers sequentially before being counterstained with DAPI.Images were captured (20x objective, Zeiss C, Axiocam; Zen lite 2012), individual cell pixel intensity measured (50-70 cells, Image J) and compared across the different assays/samples (S1 Appendix).The pixel intensity was measured using ImageJ (https://celldivisionlab.com/2015/08/12/using-imagej-to-measure-cell-fluorescenc/) [24].The respective H&E slides (cytology) were interpreted by two oral pathologists blinded to histology diagnosis.

Automated analysis of cytology image analysis
Single cell segmentation and extraction of quantitative features.Single epithelial cells (Fig 2 ) were extracted from single marker stained fluorescent images (using U-Net segmentation model) and classified (Artefact-Net model) after pre-processing as explained previously [25].Atypical cells [11] with increased nuclear-cytoplasmic ratio, irregular nuclear shape, and abnormal cell shapes were selected from segmented cytology images of the patient.In order to classify these cells, Inception-V3 and Cancer-Net, a Convolutional Neural Network (CNN) model with fewer parameters were built.Inception-V3 was trained using the last 164 layers and Cancer-Net was developed with 3x3 CNN filters with ReLu and Batch Normalization for feature extraction with 4, 8 and 16 filters before initial MaxPooling.A CNN with kernel size 1X1 was used before max-pooling.Three blocks of CNN (27 layers) were used before global Single epithelial cells were smoothened by median filter and images augmented by random rotation, width shift, height shift, shear range, zoom range, horizontal flip, and vertical flip.The evaluation metrics used were accuracy, F1 Score, sensitivity, and specificity.The morphology features (nuclear-cytoplasmic area ratio, diameter ratio, perimeter ratio, major and minor axis ratio, solidity, orientation, eccentricity, and convex area) and marker expression of each cell and nucleus, were extracted from single cells using SkImage-region props library in python.The expression features such as mean intensity of markers and probability score of the atypical CNN model was also considered for feature vector generation.The feature of all cells of each patient was used to develop a statistical aggregate (average, maximum and standard deviation) and these statistical aggregates were used to develop multiple machine learning models.

Statistical analysis
The systematic review and meta-analysis were performed using Review Manager 5.3 from Cochrane collaboration and MetaDisc Version 1.4.The odds ratio (OR; for dichotomous data) and mean difference (continuous data) were calculated (Forest Plot analysis) to find the association between the marker and dysplasia.Summary estimates were expressed in terms of OR/ Summary Area under curve (sAUC).The heterogeneity between the studies was evaluated by Chi-square test (P<0.05)and I 2 (>50%) with random effects model.A sensitivity analysis was performed if heterogeneity is significant.
The cytology data (marker expression and morphological parameters) were scaled and trained with different machine learning models.For hyper parameter tuning, 3-fold cross-validation of training data was performed and tested in 30%.The diagnostic performance was calculated for two-class system (oral cancer/HGD vs. LRL) and was assessed by comparing sensitivity, specificity and AUC in the training and test data.HGD and cancer was considered as reference standard (histology diagnosis) positive.All the machine learning protocols were performed in Python 3.7.4.
Sample size.The markers were positive in 70% of OPMD and 96% of oral cancer.Assuming 80% power (alpha error 5%), the minimum required sample size was 32 per group.Accordingly, 40 samples were taken in each cohort for histology/cytology validations.Descriptive statistics was used to summarize the details of patient demography, clinical features and pathology diagnosis.Kolmogorov-Smirnov test was performed to assess the normal distribution of IHC/cytology scores.All the statistical comparisons were by ANOVA (Kruskal-Wallis test; multiple groups and student's T-test; 2 groups).P-value<0.05(2-sided) was considered as significant.All statistical analyses were performed using MedCalc 14.8.1.

Systematic review and meta-analysis for marker selection
The literature search returned 1165 articles/abstracts (article: 848; abstracts: 317) and the articles related to oral cancer/OPMD were screened by a sequential process and 170 of them were selected.Further, based on the inclusion and exclusion criteria, 80 articles were selected for data extraction.Among these articles, based on specific criteria (S1, S3-S5 Figs, Table 1), 46 articles were included in the quantitative analysis and 14 markers were identified for individual analysis (S1 Table ).Quadas-2 assessment showed a high bias in the selection of patient cohort and interpretation of IHC (S6 and S7 Figs) in the articles.Quantitative analysis indicated that 10/14 markers showed a significant association with dysplastic-OPMD; P53, Ki-67, Podoplanin, CyclinD1, Ecadherin, PCNA, CD44, CDK4, p27 and Syndecan-1 (Table 1).

Immunohistochemical analysis identified the marker profile correlating HGD/cancer
The validation of the 10-marker panel was carried out by IHC in two phases.In phase I, all markers were evaluated for the presence of at least one of the criteria; i) differentiate HGD

CD44 and SNA-1 delineated high grade oral lesions by cytology
Cal-27 (OSCC) and DOK (Dysplastic oral keratinocyte) cells were used to assess the profile of the four markers in addition to standardizing cytology (S11 Fig) .In comparison to normal cells of healthy volunteers, SNA1 and MAA had significantly (p<0.001) higher intensity (Image J) in CAL-27 and DOK cells.(S11 Fig) .CD44 showed high percentage positivity in Cal27 (90%) and DOK (40%).However, no significant difference was observed in mean intensity or percentage positivity between DOK and CAL-27.

SNA-1 integrated with image analysis in delineating HGD/cancer
As a final step, we attempted to assess the accuracy of SNA-1, when integrated with image automation in delineating HGD/cancer.Manual analysis of the SNA-1 data set provided a sensitivity and specificity of 83% and 71% respectively (S13 Fig) .The segmentation (U-Net) and classification models (Artefact-Net) previously developed were used in this study [25].Lectin ).The Cancer-Net model showed a similar cross-validation sensitivity (90.81%) and specificity (96.22%), but with a higher test sensitivity (88.6%) and specificity (97.90%) for delineating atypical cells from normal cells (Fig 7 ; S9 Table ).
The important features (S11 and S12 Tables) identified were average neural-network prediction score of atypical cells (p = 0.001), nuclear-cytoplasmic area ratio (p = 0.016), Min-axis ratio (p = 0.008), and maximum convex area of cells (p = 0.09) by tuning VIF and significance by Logistic regression model.The model gave a high training sensitivity (89%)/specificity (82%) as well as test sensitivity (87%) and specificity (82%).PCA with Regularized Logistic regression Model: All features were used in this model after dimension reduction with PCA.Ten Principal components, L2 regularization, and 'logistic_c' as 2 were selected as best parameters on cross validation.The training sensitivity/specificity mirrored the logistic regression model while the test sensitivity and specificity were 83% and 88% respectively.Random Forest Model: The most important features identified were the neural network prediction score, convex area of nucleus, eccentricity of the nucleus, minimum axis ratio, nuclear-cytoplasmic area ratio, and SNA-1 intensity.The model was over-fit with test sensitivity (83%) and specificity (88%) same as Regularized Logistic regression model (S12 Table ).

Discussion
Cytopathology in oral cancer, despite being a minimally-invasive assay for early detection, has been challenged by subjectivity in interpretation [11] and lack of well-defined diagnostic criteria.Several methods including OralCDx, OralCyte and ClearPrep improved brush biopsy, however diagnostic value for dysplasia has been 30-40% [31].A study from our group reported that liquid-based conventional cytology delineated HGD with 25% sensitivity [11].The present study addresses the poor diagnostic accuracy of oral cytology and the critical lacunae that limits its wider clinical application by applying molecular markers as an adjunct.
in the lesion site.The profile shows a significant increase as diseases progresses.CD44 (B1-B3) and CyclinD1 (D1-D3) expression parameters, percentage of cells with higher intensity (> 4 intensity score out of 6), nuclear positivity and maximum intensity showed significantly less expression in LRL compared to HGD/OSCC.* <0.05; ** <0.005.Graph represent mean ± Standard error.LRL: Low Risk Lesions (Non-Dysplastic and mild dysplastic oral lesion).HGD: High-Grade Dysplasia (Moderate/Severe Dysplasia), OSCC: Oral Squamous cell carcinoma, HRL: High Risk Lesions (OSCC+HGD).SNA-1 (FITC conjugated and DAPI staining, E1-E3; Magnification 20x objective) and CD44 (F1-F3; Magnification 40x objective) staining of cells from OSCC, HGD and benign subjects.Images of OSCC and HGD patients shows high staining compared benign subjects.https://doi.org/10.1371/journal.pone.0291972.g005Herein, markers selected from a systematic review were validated sequentially by histochemistry and cytology to arrive at a panel that can improve accuracy of oral cytology.Our study revealed that Cyclin D1, CD44, MAA and SNA-1 offered a sensitivity of 89% and specificity of 92% in delineating HGD and oral cancer.In cytology, classification models integrating clinical parameters with SNA-1 (detects aberrant glycosylation) and CD44 (glycoprotein) differentiated OSCC/HGD from LRL with high sensitivity /specificity (>84%; n: 272).Multiplexed cytology models as well as SNA-1 integrated with image automation indicated the significance of molecular markers in improving oral cytology-based detection.Meta-analysis evaluating diagnostic efficacy of markers assessed by IHC in delineating dysplastic-OPMD are comparatively few.Our study is the first, to the best of our knowledge, to identify biomarkers (n = 10) associated with dysplasia through meta-analysis.A recent metaanalysis identified Ki67 as capable of distinguishing degrees of dysplasia in actinic cheilitis patients, albeit with high heterogeneity [32].The risk of bias/heterogeneity was high in our study, mostly attributed to variations in IHC scoring method.Further, QUADAS assessment indicated a high risk of bias in the interpretation of index test and applicability in case selection.This was primarily due to a majority of the studies focusing on clinical diagnosis as the reference standard, while our question was pertaining to delineating dysplasia.This challenge was addressed by carrying out sensitivity analysis to reduce the heterogeneity.After the sensitivity analysis, CD44 and P53, investigated in maximum number of dysplastic-OPMD subjects (CD44: 526; P53: 897) with a higher AUC (0.80), were carried forward for validation in cytology.A recent meta-analysis identified that P53 overexpression was associated with a two-fold risk of malignant transformation of OPMD [33], while CD44 is a well-established marker in many solid malignancies promoting tumour cell growth and migration [34].As observed in our review, multiple studies have reported increased expression in OPMD [35] and with dysplastic progression [23], however their relevance in cytology-based assessment is not known.
Lectins, which detect aberrant glycosylation patterns, have been identified as markers of neoplastic changes in many cancers [17,29,30].This study identified three lectins (WGA, MAA and SNA-1) reported as differentials in multiple cancers in various studies including ours [19], wherein local application of WGA on oral lesions significantly differentiated OPMD.However, our current results indicate that WGA (binds to N-acetyl D-Glucosamine and sialic acid) could not significantly differentiate HGD from LRL by histochemistry, indicating a difference in histological staining as compared to topical application.MAA is known to specifically bind to alpha-2, 3 sialic acid, while SNA-1, elderberry bark tree lectin (Sambucus Nigra; Elderberry Bark), detects the aberrant glycosylation in alpha-2, 6 sialic acid; both reported previously in detection of prostate cancer [29,30].In our study, SNA-1 staining pattern in cytology significantly increased as the disease progressed from LRL to OSCC, while MAA showed high sensitivity in detection of HGD/OSCC indicating the significance of specific glycosylation patterns (α 2,6 and α 2,3 linkages) in oral cancer progression.Although further studies are essential to delineate the exact role of lectins during carcinogenesis, this study clearly identified SNA-1 as a significant adjunct to delineate HGD.
In a recent study [36], oral cytology was shown to have sensitivity and specificity of 79% and 94%, respectively.However, the study cohort had low representation of HGD (12%) and OSCC (6.7%).Biomarker-based oral cytology has been attempted in a few studies, McM2 and Laminin γ2 chain in pre-invasive or invasive squamous cells in brush biopsies showed an increased sensitivity when compared to conventional cytology [13].A recent study also reported that αvβ6, EGFR, CD17, McM2, Geminin and Ki-67 could differentiate oral cancer and HGD with 78% sensitivity [9].Our study pointed out a definite clinical relevance of markers in cytopathology, wherein a two-marker panel of CD44 and SNA-1 improved the efficacy of cytology-based delineation of OSCC/HGD when used individually with clinical parameters (Sensitivity: 83%) and upon multiplexing (Sensitivity: 92%).
The nomogram of biomarker cytology can help to identify significant confounding factor/s given the heterogeneity in the sites of oral cavity and patient clinical parameters.CD44-SNA1 nomogram indicated that CD44 expression correlated with sites and age (high in tongue, gingiva with age >40 years).Interaction of markers with clinical parameters was hence included during feature engineering and development of the integrated classification model.In our study, combining clinical parameters with CD44-SNA1 multiplexed oral cytology profile gave a high sensitivity (92%) and specificity (84%).This data analysis pipeline combining marker profile with clinical parameters, for cytology-based diagnosis of HGD and cancer, has not been attempted previously.
Image automation approach in cytology enables integration of multiple parameters, objective assessment, and thereby improved accuracy.In our previous study [11] we have used InceptionV3 for the classification of Hematoxylin-Eosin stained atypical cells from normal cells with sensitivity of 73% in HGD-OPMD.The transfer learning model used in the current study showed slightly over fit in the test data.In this study, we used the Cancer-Net for feature extraction and classification of atypical cells.The logistic regression model provided the best test sensitivity (87%) and specificity (82%), an accuracy which was an improvement on the manual analysis of SNA-1-based oral cytology.The most significant features identified were neural network prediction score, nuclear-cytoplasmic area ratio, minimum axis ratio, and cell convexity.These features [11] were identified as important in the previous study for the classification of cancer/atypical cells by pathologist's interpretation, however they could not delineate HGD.In our study, Random Forest provided a good training sensitivity/specificity of 95% with bootstrap sampling.Previous studies using molecular markers-EGFR, Ki67 had an accuracy of 70% [37] for delineation of cancer and pre-cancer lesions.The risk stratification machine learning model using Logistic regression (L1 regularization) combining morphological parameters and molecular markers-αvβ6, EGFR, CD17, McM2, geminin, and Ki-67 differentiated cancer and HGD from LRL with sensitivity and specificity of 78% and 88% respectively [9].Our study provides an improvement on oral cytology with SNAI-based cytopathology integrated with automated image analysis (Sensitivity: 86%).Automated image analysis of multiplexed images will improve the accuracy further, studies are currently ongoing to develop the pipeline.The markers were tested in independent cohorts over the course of two phases, and automation produced results that were comparable; nevertheless, the study still requires external validation.
In conclusion, use of biomarkers in oral cytology improves the accuracy of the approach in risk-stratification of the high-grade lesions.Our study indicated that use of a biomarkers (CD44/SNA-1) integrated with clinical parameters or SNA-1 with automated image analysis improved the accuracy to >85%, while multiplexed 2-marker panel analysis further improved it to >90%.Given that implementation of oral cytology in a large scale will be extremely relevant and feasible towards oral cancer early detection and down-staging of the disease, identifying specific markers and establishing their clinical relevance is a significant step towards developing a pathology-equivalent, point-of-care diagnostic tool.

Fig 1 .
Fig 1. Study design.Participants were recruited according to inclusion and exclusion criteria, and individuals with oral lesions underwent brush biopsy (for marker-based cytology) followed by an incisional biopsy if indicated (A).Markers identified by systematic review and meta-analysis were validated in tissues (B).The liquid-based molecular cytology was performed with selected markers (C), single and multiplexed and marker expression was evaluated manually for the classification of oral cancer and HGD from Low-Risk lesions.The cytology image analysis was further automated by segmenting single cells, feature extraction of single cells and machine learning models developed (D).HGD: High Grade Dysplasia.https://doi.org/10.1371/journal.pone.0291972.g001

Fig 2 .
Fig 2. Workflow of image analysis and deep learning models.Fluorescent microscopic cytology images contain epithelial clusters, blood cells, and artefacts along with single epithelial cells (A).Single epithelial cells were segmented using the U-Net model (B) and classified as atypical and normal cells using the Cancer-Net model (C), and quantitative features were extracted for developing the classification model.https://doi.org/10.1371/journal.pone.0291972.g002 stained cytology datasets (Fig 6A) were used for automation.The classification model was trained using the MAA-FITC dataset (Atypical cells: 730; augmented: 5692; normal cells: 1158; augmented: 8093) and tested in SNA-1 dataset (atypical cells: 53; normal cells: 143).The training for atypical cell classification was performed by Inception-V3 and Cancer-Net (Fig 7; S17 Fig).The Inception-V3 gave a high cross-validation sensitivity (90.27%) and specificity (99.8%), however the model showed less test sensitivity (SNA-1: 75.47%:S9 Table

Fig 7 .
Fig 7. Classification of cells using Cancer-Net model.The Cancer-Net model was employed for classification of segmented single epithelial cells (A).The occlusal maps (visual representation of the regions of interest) showed that nucleus, and cytoplasm around nucleus were used by Cancer-Net model for atypical cell classification.The cell, heat map (occlusal map) and overlay with cell are depicted (B).The training/cross validation metrics for differentiating normal and atypical cells (C1-C3) showed F1Score above 0.90.https://doi.org/10.1371/journal.pone.0291972.g007

Table 2 . Comparison of different models in molecular cytology based delineation of OSCC and HGD. Phase I ICC: Single-marker combination (HRL Vs LRL)
Represents the sensitivity specificity, predictive values and accuracy of machine learning model in differentiating HRL from LRL (manual analysis-test results) in Phase I and Phase II ICC (multiplex) validation.LRL: Non-Dysplastic oral lesion.HGD: Moderate/Severe Dysplasia, OSCC: Oral Squamous cell carcinoma, HRL: OSCC+ HGD https://doi.org/10.1371/journal.pone.0291972.t002